Enhancing Supervised Learning with Unlabeled Data

نویسندگان

  • Sally A. Goldman
  • Yan Zhou
چکیده

In many practical learning scenarios, there is a small amount of labeled data along with a large pool of unlabeled data. Many supervised learning algorithms have been developed and extensively studied. We present a new \co-training" strategy for using un-labeled data to improve the performance of standard supervised learning algorithms. Unlike much of the prior work, such as the co-training procedure of Blum and Mitchell (1998), we do not assume there are two redundant views both of which are suucient for classiication. The only requirement our co-training strategy places on each supervised learning algorithm is that its hypothesis partitions the example space into a set of equivalence classes (e.g. for a decision tree each leaf deenes an equivalence class). We evaluate our co-training strategy via experiments using data from the UCI repository.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-Supervised Learning Based Prediction of Musculoskeletal Disorder Risk

This study explores a semi-supervised classification approach using random forest as a base classifier to classify the low-back disorders (LBDs) risk associated with the industrial jobs. Semi-supervised classification approach uses unlabeled data together with the small number of labelled data to create a better classifier. The results obtained by the proposed approach are compared with those o...

متن کامل

On Efficient Large Margin Semisupervised Learning: Method and Theory

In classification, semisupervised learning usually involves a large amount of unlabeled data with only a small number of labeled data. This imposes a great challenge in that it is difficult to achieve good classification performance through labeled data alone. To leverage unlabeled data for enhancing classification, this article introduces a large margin semisupervised learning method within th...

متن کامل

In ICML 2000 Enhancing Supervised Learning with Unlabeled

In many practical learning scenarios, there is a small amount of labeled data along with a large pool of unlabeled data. Many supervised learning algorithms have been developed and extensively studied. We present a new \co-training" strategy for using un-labeled data to improve the performance of standard supervised learning algorithms. Unlike much of the prior work, such as the co-training pro...

متن کامل

Estimate Unlabeled-Data-Distribution for Semi-supervised PU Learning

Traditional supervised classifiers use only labeled data (features/label pairs) as the training set, while the unlabeled data is used as the testing set. In practice, it is often the case that the labeled data is hard to obtain and the unlabeled data contains the instances that belong to the predefined class beyond the labeled data categories. This problem has been widely studied in recent year...

متن کامل

Learning for Real-World Image Applications

We study the visual learning models that could work efficiently with little ground-truth annotation and a mass of noisy unlabeled data for large scale Web image applications, following the subroutine of semi-supervised learning (SSL) that has been deeply investigated in various visual classification tasks. However, most previous SSL approaches are not able to incorporate multiple descriptions f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000